conditional stochastic optimization and application
Review for NeurIPS paper: Biased Stochastic First-Order Methods for Conditional Stochastic Optimization and Applications in Meta Learning
Strengths: To the best of my knowledge, the BSGD algorithm is the first stochastic-gradient based algorithm that directly solves CSO problem itself. The two most relevant work that focus on CSO are [12] and [24]; [12] solves a saddle-point problem reformulation of CSO, while [24] resorts to providing sample complexities for SAA approach to solve general CSO problem. With respect to the SAA approach presented in [24], BSGD method improves in sample complexities (they remove the dependence on d) when F is general convex, matching the lower bounds they provide. Although BSGD is not optimal when F is strongly convex and smooth, it matches the complexities of SAA approach[24]. They also argue about the settings in which BSGD may not be optimal, providing a transparent evaluation of their algorithm.
Review for NeurIPS paper: Biased Stochastic First-Order Methods for Conditional Stochastic Optimization and Applications in Meta Learning
During the rebuttal phase the reviewers did not come to a consensus. Two reviewers--especially R4, believe that the paper should not get accepted since the authors base their analysis on well known ideas. The other two think that the results are novel enough and are of the interest of the NeurIPS community. I tend to agree with the latter, so I will therefor accept.
Biased Stochastic First-Order Methods for Conditional Stochastic Optimization and Applications in Meta Learning
Conditional stochastic optimization covers a variety of applications ranging from invariant learning and causal inference to meta-learning. However, constructing unbiased gradient estimators for such problems is challenging due to the composition structure. As an alternative, we propose a biased stochastic gradient descent (BSGD) algorithm and study the bias-variance tradeoff under different structural assumptions. We establish the sample complexities of BSGD for strongly convex, convex, and weakly convex objectives under smooth and non-smooth conditions. Our lower bound analysis shows that the sample complexities of BSGD cannot be improved for general convex objectives and nonconvex objectives except for smooth nonconvex objectives with Lipschitz continuous gradient estimator.